Measures of Location 


median 


mode Mo = most frequency 


2 
Weighted Mean Harmonic Mean Geometric Mean Quadratic Mean 
ya Lwx HM = GM = YADAN X3) -Xn 2 
o z V(&1)(X2)(X3 ) ) om = 2 


Measures of Variation 


E highest value — lowest value R = highest value — lowest value 


Variance 


Coefficient of CV=Ż2x 100 CV=2x100 
Variati X H 
ariation 


Variance ee nÈ f Xin) — È fX) 
n(n —1) 


Range Rule of Thumb 


A rough estimate of the standard deviation is 


_ range 
4 


e Smallest data value = X — 2s 


e Largest data value = X + 2s 


Chebyshev’s Theorem 
1 
e l[- a 
e Ifk value is not given, the formula to find k is; 
SV-X 
n k= 
S 


2 standard deviations 


3 standard deviations 


75% 


89% 


The Empirical Rule (Normal) 


1 standard deviation 


2 standard deviations 


3 standard deviations 


68% 95% 99.7% 
Measures of Position 
Standard Scores Population Sample 
value—mean Z= X-u = X-X 


~ standard deviation 


oO 


If Z is given, the formula to find X is; 


e X=X+4+2s 


Percentile 


_ (no:of values below X)+0.5 


e Percentile = x 100 


total number of values 


Finding a Data Value Corresponding to a Given Percentile; where n = total number of values, p = percentile 


Interquartile range (IQR) 
e IQR=Q3-Ql 
Outlier 


e QI -1.5*IQR (or) Q3 + 1.5*IQR 


Exploratory Data Analysis 

The five-number summary 

1. The lowest value of the data set (i.e., minimum) 
2.Q1 

3. The median 

4. 03 


5. The highest value of the data set (i.e., maximum) 


Constructing a Boxplot 
Step 1 Find five-number summary 


Step 2 Draw a horizontal axis and place the scale on the axis. start on or below the minimum data value and end on or 
above the maximum data value. 


Step 3 Locate the lowest data value, Q1, the median, Q3, and the highest data value; 
then draw a box 


Finally, draw a line from the minimum data value to the left side and from the maximum data value to the right side. 


Information Obtained from a Boxplot 
1. a. If median is near the center, (symmetric). 

e b. If median falls to the left of the center, (positively skewed). 

e c. If median falls to the right of the center, (negatively skewed). 
2. a. If lines are about the same length, (symmetric). 

e 6. Ifthe right line is larger than the left line, (positively skewed). 


e c. If the left line is larger than the right line, (negatively skewed). 


Important formulas for DRS 112 


Formula for the mean for individual data: 


Sample x-% 
n 
Population L= 2 
N 
= =f.Xm 
Formula for the mean for grouped data: X = = 
r mr xwX 
Formula for the weighted mean: X = re 


lowest valuet+highest value 


Formula for the midrange: MR = 5 
Formula for the range: R = highest value — lowest value 


; . X(x- u)? 
Formula for the variance for population data: o? = BE 


Formula for the variance for sample data (shortcut formula for the unbiased estimator): 


nX?) — (X)? 
n(n — 1) 


2 


n(Zf.Xm7)-(2f Xm)" 


Formula for the variance for grouped data: s* = 
n(n—-1) 


stain’ r E(X- u)? 

Formula for the standard deviation for population data: o = EN 
2j 2 
Formula for the standard deviation for sample data (shortcut formula): s = eee 


2)_ 2 
Formula for the standard deviation for grouped data: s = pein En 


Formula for the coefficient of variation: CVar = = .100 or CVar= z .100 


Xf u 


range 


Range rule of thumb: s = 5 


Expression for Chebyshev’s theorem: The proportion of values from a data set that will fall within k 
standard deviations of the mean will be at least 


Where k is a number greater than 1. 


Formula for the z score (standard score): 


X-X 
Sample: z = = 


: xX- 
Population: z = E 


e . cumulative frequenc 
Formula for the cumulative percentage: Cumulative % = —, .100 


Formula for the percentile rank of a value X: Percentile = 
total number of values 


. x n . . n: 
Formula for finding a value corresponding to a given percentile: c = cr 


Formula for interquartile range: IQR = Q; — Q, 


number of values below X+0.5 


100 
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Data Description 


Measures of Central Tendency 
Measures of Variation 
Measures of Position 
Exploratory Data Analysis 


Measures of Central Tendency 


e A statistic is a characteristic or measure obtained by 
using the data values from a sample. 


e A parameter is a characteristic or measure obtained 
by using all the data values from a specific 
population. 


The Mean 


¢ The mean, also known as the arithmetic average, 1s 
found by adding the values of the data and dividing 
by the total number of values. 


¢ The sample mean, denoted by X , is calculated by 
using sample data. The sample mean is a statistic. 


¢ The population mean, denoted by u is calculated by 
using all the values in the population. The population 
mean is a parameter. 


EXAMPLE 3-1 Avian Flu Cases 


The number of confirmed flu cases for a 9-year 
period is shown. Find the mean. 


4 46 98 115 88 44 73 48 62 


EXAMPLE 3-2 Store Sales 


The data show the system wide sales (in millions) 
for U.S. franchises of a well-known donut store for a 5- 
year period. Find the mean. 


$221 $239 $262 $281 $318 


The Median 


¢ The median is the halfway point in a data set. 


¢ The median is the midpoint of the data array. The 
symbol for the median is MD. 


EXAMPLE 3-4 Tablet Sales 


The data show the number of tablet sales in 
millions of units for a 5-year period. Find the median of 
the data. 


108.2 17.6 159.8 69.8 222.6 


EXAMPLE 3—5 Tornadoes in the United States 


The number of tornadoes that have occurred in 
the United States over an 8-year period is as follows. 
Find the median. 


684, 764, 656, 702, 856, 1133, 1132, 1303 


The Mode 


e The mode is the value that occurs most often in the 
data set. It is sometimes said to be the most typical 
case. 


e The value that occurs most often in a data set is called 
the mode. 


A data set that has only one value that occurs with the 
greatest frequency is said to be unimodal. 


If a data set has two values that occur with the same 
greatest frequency, both values are considered to be 
the mode and the data set is said to be bimodal. 


If a data set has more than two values that occur with 
the same greatest frequency, each value is used as the 
mode, and the data set is said to be multimodal. 


When no data value occurs more than once, the data 
Set 1s said to have no mode. 


EXAMPLE 3-6 Public Libraries 


The data show the number of public libraries in a 
sample of eight states. Find the mode. 


114 77 21 101 311 77 159 382 


EXAMPLE 3-7 Licensed Nuclear Reactors 


The data show the number of licensed nuclear 
reactors in the United States for a recent 15-year period. 
Find the mode. 


104 104 104 104 104 107 109 109 
109 110 109 111 112 111 109 


The Midrange 


¢ The midrange is a rough estimate of the middle. It is 
found by adding the lowest and highest values in the 
data set and dividing by 2. 


e The midrange is defined as the sum of the lowest and 
highest values in the data set, divided by 2. The 
symbol MR is used for the midrange. 


EXAMPLE 3-12 Bank Failures 


The number of bank failures for a recent five-year 
period is shown. Find the midrange. 


3, 30, 148, 157, 71 


EXAMPLE 3-13 NFL Signing Bonuses 


Find the midrange of data for the NFL signing 
bonuses. The bonuses in millions of dollars are 


18, 14, 34.5, 10, 11.3, 10, 12.4, 10 


Mean (Grouped Data) 


EXAMPLE Salaries of CEOs 

The frequency distribution shows the salaries (in 
millions) for a specific year of the top 25 CEOs in the 
United States. Find the mean. 


Class reat quenc 


teats 


25.5-30.5 


= =" = = 27.60 (in millions) 


Median (Grouped Data) 


EXAMPLE Salaries of CEOs 

The frequency distribution shows the salaries (in 
millions) for a specific year of the top 25 CEOs in the 
United States. Find the median. 


Class reat quenc 


teats 


Median 
MD = nE Lai ES 
e Where 
Ys EDS 
Median class is 26-30. 
cf=7 
f = 13 
w=5 
L„= 25.5 
MD === (5) + 25.5 
=Z] 62 (in millions) 


Mode (Grouped Data) 


¢ The mode for grouped data is the modal class. The 
modal class is the class with the largest frequency. 


EXAMPLE 3-9 Salaries of CEOs 

Find the modal class for the frequency 
distribution for the salaries of the top CEOs in the 
United States. And find the mode. 


Class reat quenc 


ee o 


Mode 
Ay 
A4 +43 


Mo = 
Where 
26-30 is modal class. 
bya 23 
w=5 
A1=fo-f-1=13-6 = 7 
A2=fo-f41=13-4 = 9 
Mo =—~ (5) + 25.5 
= 27.69 (in millions) 


(w) + Lo 


Measures of Location 


Mo = most frequency 


2 


Exercise. | 


The data shown are the total compensation (in 
millions of dollars) for the 50 top-paid CEOs for a 
recent year. Find the mean, median, mode, and 
midrange for the data. 


17.5 18.0 36.8 31.7 31.7 17.3 24.3 47.7 38.5 17.0 
23.2 16.5 25.1 17.4 18.0 37.6 19.7 21.4 28.6 21.6 
19.3 20.0 16.9 25.2 19.8 25.0 17.2 20.4 20.1 29.1 
19.1 25.2 23.2 25.9 23.2 41.7 24.0 16.8 26.8 31.4 
16.9 17.2 24.0 35.2 19.1 22.9 18.2 25.4 35.4 25.5 


2} X _ 1219.7 
n E 50 7 


= 24.39 


16.8 16.9 16.9 

18 18 18.2 

20.1 20.4 21.4 
24 243 25 
26.8 28.6 29.1 
37.6 38.5 41.7 


eo (21) -25,50 — -232 


17 
19.1 
21.6 
25.1 
31.4 
47.7 


Vi2 
19.1 
229 


29:2 
31.7 


17.2 
19.3 
232 


25.2 
31.7 


17.3 
19.7 
23.2 


25.4 
552 


17.4 
19.8 
252 


23.9 
35.4 


This frequency distribution represents the commission 
earned (in dollars) by 100 salespeople employed at several 
branches of a large chain store. Find the mean, median, and mode 
for the frequency distribution. 


Class limits 


150-158 
159-167 
168—176 


177—185 
186—194 
195—203 
204-212 


Class 


C.B 


150-158 


159-167 


168—176 


186—194 
195-203 


204-212 


Mean 


M 


an 


= 180.28 (in dollars) 


Median 

MD = 
Where 

LO ae E 50 

cf= 41 

f=21 

w=9 

Lm= 176.5 

MD = 22 (9) + 176.5 

=180. 36 (in dollars) 


n/2- MECI (w ) + Ln 


Mode 
Ay 
A4 +43 


Mo = 
Where 

L=1760.5 

W=9 

A1=fo-f-1=21-20=1 

A2=f0o-f+1=21-20=1 

Mo =— (9) + 176.5 

=181(n dollars) 

This distribution is skewed to the left. 


(w) + Lo 


The Weighted Mean 


e Sometimes, you must find the mean of a data set in 
which not all values are equally represented. 


EXAMPLE.1 Grade Point Average 


e A student received an A in English Composition I (3 
credits), a C in Introduction to Psychology (3 credits), 
a B in Biology I (4 credits), and a D in Physical 
Education (2 credits). Assuming A = 4 grade points, B 
= 3 grade points, C = 2 grade points, D = 1 grade 
point, and F = 0 grade points, find the student’s grade 
point average. 


Harmonic Mean 


e The harmonic mean (HM) is defined as the number 
of values divided by the sum of the reciprocals of 
each value. 


EXAMPLE .2 


e Suppose a person drove 100 miles at 40 miles per 
hour and returned driving 50 miles per hour. The 
average miles per hour is not 45 miles per hour, 
which is found by adding 40 and 50 and dividing by 
2, 


Geometric Mean 


¢ The geometric mean (GM) is defined as the nth root 
of the product of n values. 


e The geometric mean is useful in finding the average 
of percentages, ratios, indexes, or growth rates. 


EXAMPLE .3 


¢ If a person receives a 20% raise after 1 year of 
service and a 10% raise after the second year of 


service, the average percentage raise per year is not 
15 but 14.89%. 


Quadratic Mean 


e A useful mean in the physical Sciences (such as 
voltage) is the quadratic mean (QM), which is found 
by taking the square root of the average of the squares 
of each value. 


EXAMPLE .4 


¢ Find the quadratic mean of 8, 6, 3, 5, and 4. 


Exercise.1 


Final Grade 


An instructor grades exams, 20%; term paper, 
30%; final exam, 50%. A student had grades of 83, 72, 
and 90, respectively, for exams, term paper, and final 
exam. Find the student’s final average. Use the 
weighted mean. 


Exercise.2 


(a) A salesperson drives 300 miles round trip at 30 
miles per hour going to Chicago and 45 miles per hour 
returning home. Find the average miles per hour. 

(b) A bus driver drives the 50 miles to West Chester at 
40 miles per hour and returns driving 25 miles per hour. 
Find the average miles per hour. 

(c) A carpenter buys $500 worth of nails at $50 per 
pound and $500 worth of nails at $10 per pound. Find 
the average cost of 1 pound of nails. 


Exercise.3 

Find the geometric mean of each of these. 

(a) The growth rates of the Living Life Insurance 
Corporation for the past 3 years were 35, 24, and 18%. 
(b) A person received these percentage raises in salary 
over a 4-year period: 8, 6, 4, and 5%. 

(c) A stock increased each year for 5 years at these 
percentages: 10, 8, 12, 9, and 3%. 

(d) The price increases, in percentages, for the cost of 


food in a specific geographic region for the past 3 years 
were 1, 3, and 5.5%. 


Exercise.3 


e Find the quadratic mean of 8, 6, 3, 5, and 4. 
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e A frequency distribution is the organization of raw 


data in table form, using classes and frequencies. 


e Two types of frequency distributions that are most 
often used are the categorical Frequency distribution 


and the grouped frequency distribution. 


Categorical Frequency Distributions 


The categorical frequency distribution is used 
for data that can be placed in specific categories, such as 
nominal- or ordinal-level data. For example, data such 
as political affiliation, religious affiliation, or major 


field of study. 


Grouped Frequency Distributions 


When the range of the data is large, the data must 
be grouped into classes that are more than one unit in 
width, in what is called a grouped frequency 


distribution. 


e Lower class limit 
e Upper class limit 


e Class midpoint Xm is obtained by adding the lower and upper boundaries 


and dividing by 2, or adding the lower and upper limits and dividing by 2 


e Class boundaries 


Lower boundary= Lower limit — ea 


Upper boundary= Upper limit + — 


¢ The class width for a class in a frequency distribution 
is found by subtracting the lower (or upper) class limit 
of one class from the lower (or upper) class limit of the 


next class. 


e The class width can also be found by subtracting the 
lower boundary from the upper boundary for any given 


class. 


The researcher must decide how many classes to use and the width of each 


class. 


1. There should be between 5 and 20 classes. 

2. It is preferable but not absolutely necessary that the class width be an odd 
number. 

3. The classes must be mutually exclusive. Mutually exclusive classes have 
non-overlapping class limits so that data cannot be placed into two classes. 

4. The classes must be continuous. 

5. The classes must be exhaustive. There should be enough classes to 
accommodate all the data. 

6. The classes must be equal in width. This avoids a distorted view of the 


data. 


e One exception occurs when a distribution has a class 
that is open-ended. That is, the first class has no 
specific lower limit, or the last class has no specific 
upper limit. A frequency distribution with an open- 


ended class is called an open-ended distribution. 


Constructing a Grouped Frequency Distribution 
e Step 1 Determine the classes. 

Find the highest and lowest values. 

Find the range. 

Select the number of classes desired. 

Find the width by dividing the range by the number of classes and 
rounding up. 

Select a starting point (usually the lowest value or any convenient 
number less than the lowest value); add the width to get the lower limits. 

Find the upper class limits. 

Find the boundaries. 
¢ Step 2 Tally the data. 
e Step 3 Find the numerical frequencies from the tallies, and find the cumulative 


frequencies. 


The Weighted Mean 


e Sometimes, you must find the mean of a data set in 
which not all values are equally represented. 


EXAMPLE.1 Grade Point Average 


e A student received an A in English Composition I (3 
credits), a C in Introduction to Psychology (3 credits), 
a B in Biology I (4 credits), and a D in Physical 
Education (2 credits). Assuming A = 4 grade points, B 
= 3 grade points, C = 2 grade points, D = 1 grade 
point, and F = 0 grade points, find the student’s grade 
point average. 


Harmonic Mean 


e The harmonic mean (HM) is defined as the number 
of values divided by the sum of the reciprocals of 
each value. 


EXAMPLE .2 


e Suppose a person drove 100 miles at 40 miles per 
hour and returned driving 50 miles per hour. The 
average miles per hour is not 45 miles per hour, 
which is found by adding 40 and 50 and dividing by 
2, 


Geometric Mean 


¢ The geometric mean (GM) is defined as the nth root 
of the product of n values. 


e The geometric mean is useful in finding the average 
of percentages, ratios, indexes, or growth rates. 


EXAMPLE .3 


¢ If a person receives a 20% raise after 1 year of 
service and a 10% raise after the second year of 


service, the average percentage raise per year is not 
15 but 14.89%. 


Quadratic Mean 


e A useful mean in the physical Sciences (such as 
voltage) is the quadratic mean (QM), which is found 
by taking the square root of the average of the squares 
of each value. 


EXAMPLE .4 


¢ Find the quadratic mean of 8, 6, 3, 5, and 4. 


Exercise.1 


Final Grade 


An instructor grades exams, 20%; term paper, 
30%; final exam, 50%. A student had grades of 83, 72, 
and 90, respectively, for exams, term paper, and final 
exam. Find the student’s final average. Use the 
weighted mean. 


Exercise.2 


(a) A salesperson drives 300 miles round trip at 30 
miles per hour going to Chicago and 45 miles per hour 
returning home. Find the average miles per hour. 

(b) A bus driver drives the 50 miles to West Chester at 
40 miles per hour and returns driving 25 miles per hour. 
Find the average miles per hour. 

(c) A carpenter buys $500 worth of nails at $50 per 
pound and $500 worth of nails at $10 per pound. Find 
the average cost of 1 pound of nails. 


Exercise.3 

Find the geometric mean of each of these. 

(a) The growth rates of the Living Life Insurance 
Corporation for the past 3 years were 35, 24, and 18%. 
(b) A person received these percentage raises in salary 
over a 4-year period: 8, 6, 4, and 5%. 

(c) A stock increased each year for 5 years at these 
percentages: 10, 8, 12, 9, and 3%. 

(d) The price increases, in percentages, for the cost of 


food in a specific geographic region for the past 3 years 
were 1, 3, and 5.5%. 


Exercise.3 


e Find the quadratic mean of 8, 6, 3, 5, and 4. 
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e A frequency distribution is the organization of raw 


data in table form, using classes and frequencies. 


e Two types of frequency distributions that are most 
often used are the categorical Frequency distribution 


and the grouped frequency distribution. 


Categorical Frequency Distributions 


The categorical frequency distribution is used 
for data that can be placed in specific categories, such as 
nominal- or ordinal-level data. For example, data such 
as political affiliation, religious affiliation, or major 


field of study. 


Grouped Frequency Distributions 


When the range of the data is large, the data must 
be grouped into classes that are more than one unit in 
width, in what is called a grouped frequency 


distribution. 


e Lower class limit 
e Upper class limit 


e Class midpoint Xm is obtained by adding the lower and upper boundaries 


and dividing by 2, or adding the lower and upper limits and dividing by 2 


e Class boundaries 


Lower boundary= Lower limit — ea 


Upper boundary= Upper limit + — 


¢ The class width for a class in a frequency distribution 
is found by subtracting the lower (or upper) class limit 
of one class from the lower (or upper) class limit of the 


next class. 


e The class width can also be found by subtracting the 
lower boundary from the upper boundary for any given 


class. 


The researcher must decide how many classes to use and the width of each 


class. 


1. There should be between 5 and 20 classes. 

2. It is preferable but not absolutely necessary that the class width be an odd 
number. 

3. The classes must be mutually exclusive. Mutually exclusive classes have 
non-overlapping class limits so that data cannot be placed into two classes. 

4. The classes must be continuous. 

5. The classes must be exhaustive. There should be enough classes to 
accommodate all the data. 

6. The classes must be equal in width. This avoids a distorted view of the 


data. 


e One exception occurs when a distribution has a class 
that is open-ended. That is, the first class has no 
specific lower limit, or the last class has no specific 
upper limit. A frequency distribution with an open- 


ended class is called an open-ended distribution. 


Constructing a Grouped Frequency Distribution 
e Step 1 Determine the classes. 

Find the highest and lowest values. 

Find the range. 

Select the number of classes desired. 

Find the width by dividing the range by the number of classes and 
rounding up. 

Select a starting point (usually the lowest value or any convenient 
number less than the lowest value); add the width to get the lower limits. 

Find the upper class limits. 

Find the boundaries. 
¢ Step 2 Tally the data. 
e Step 3 Find the numerical frequencies from the tallies, and find the cumulative 


frequencies. 


The Weighted Mean 


e Sometimes, you must find the mean of a data set in 
which not all values are equally represented. 


EXAMPLE.1 Grade Point Average 


e A student received an A in English Composition I (3 
credits), a C in Introduction to Psychology (3 credits), 
a B in Biology I (4 credits), and a D in Physical 
Education (2 credits). Assuming A = 4 grade points, B 
= 3 grade points, C = 2 grade points, D = 1 grade 
point, and F = 0 grade points, find the student’s grade 
point average. 


Harmonic Mean 


e The harmonic mean (HM) is defined as the number 
of values divided by the sum of the reciprocals of 
each value. 


EXAMPLE .2 


e Suppose a person drove 100 miles at 40 miles per 
hour and returned driving 50 miles per hour. The 
average miles per hour is not 45 miles per hour, 
which is found by adding 40 and 50 and dividing by 
2, 


Geometric Mean 


¢ The geometric mean (GM) is defined as the nth root 
of the product of n values. 


e The geometric mean is useful in finding the average 
of percentages, ratios, indexes, or growth rates. 


EXAMPLE .3 


¢ If a person receives a 20% raise after 1 year of 
service and a 10% raise after the second year of 


service, the average percentage raise per year is not 
15 but 14.89%. 


Quadratic Mean 


e A useful mean in the physical Sciences (such as 
voltage) is the quadratic mean (QM), which is found 
by taking the square root of the average of the squares 
of each value. 


EXAMPLE .4 


¢ Find the quadratic mean of 8, 6, 3, 5, and 4. 


Exercise.1 


Final Grade 


An instructor grades exams, 20%; term paper, 
30%; final exam, 50%. A student had grades of 83, 72, 
and 90, respectively, for exams, term paper, and final 
exam. Find the student’s final average. Use the 
weighted mean. 


Exercise.2 


(a) A salesperson drives 300 miles round trip at 30 
miles per hour going to Chicago and 45 miles per hour 
returning home. Find the average miles per hour. 

(b) A bus driver drives the 50 miles to West Chester at 
40 miles per hour and returns driving 25 miles per hour. 
Find the average miles per hour. 

(c) A carpenter buys $500 worth of nails at $50 per 
pound and $500 worth of nails at $10 per pound. Find 
the average cost of 1 pound of nails. 


Exercise.3 

Find the geometric mean of each of these. 

(a) The growth rates of the Living Life Insurance 
Corporation for the past 3 years were 35, 24, and 18%. 
(b) A person received these percentage raises in salary 
over a 4-year period: 8, 6, 4, and 5%. 

(c) A stock increased each year for 5 years at these 
percentages: 10, 8, 12, 9, and 3%. 

(d) The price increases, in percentages, for the cost of 


food in a specific geographic region for the past 3 years 
were 1, 3, and 5.5%. 


Exercise.3 


e Find the quadratic mean of 8, 6, 3, 5, and 4. 
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3.2 Measures of Variation 


e A. Absolute Variation 


e B. Relative variation 


Absolute Variation 


(i) Range 


(11) Variance 


(ui) Standard Deviation 


Range 
The range is the simplest of the three measures 


e R = highest value — lowest value 


EXAMPLE 3-17 Top-Grossing Movies 


The data show a sample of the top-grossing movies in 
millions of dollars for a recent year. Find the range. 


409 386 150 117 73 = 70 


Variance 


The population variance is the average of the 
squares of the distance each value is from the 
mean. 


| Sampe | Population 


Variance E X (X; — X)? aoe XXi? 


N 
ÈX- Np’ 
N 


Standard Deviation 


The population standard deviation is the square 
root of the variance. 


[sample [Popo 


Standard 


Deviation 


EXAMPLE 3-20 Teacher Strikes 


The number of public school teacher strikes in 
Pennsylvania for a random sample of school years is 
shown. Find the sample variance and the sample 
standard deviation. 


Variance for Grouped Data 


Variance E n(¥ f y2 ) —(¥ f. X)? 


7 n(n — 1) 


Find the sample variance and the sample standard 
deviation for the frequency distribution of the data 
shown. The data represent the number of miles that 20 
runners ran during one week. 


Class Frequency 
5.5-10.5 1 
10.5-15.5 2 
15.5-20.5 3 
20.5-25.5 5 
25.5-30.5 4 
30.5-35.5 3 
35.5-40.5 2 


Relative Variation 


Coefficient of Variation 


a al Rael 


Coefficient of 
Variation 
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Range Rule of Thumb 


The range can be used to approximate the standard 
deviation. The approximation is called the range rule 
of thumb. 


A rough estimate of the standard deviation is 
range 
SS 
4 


S.V=X-2s 
L.V =X +2s 


Example 


The standard deviation for the data set 5, 8, 8, 9, 10,12, 
and 13 is 2.7, and the range is 13 — 5 = 8. The range rule 
of thumb is s = 2. The range rule of thumb in this case 
underestimates the standard deviation somewhat; 
however, it is in the ballpark. 


e Smallest data value =X — 2s = 9.3 — 2(2.7) = 3.9 
e Largest data value = X + 2s = 9.3 + 2(2.7) = 14.7 


e Notice that the smallest data value was 5, and 
the largest data value was 13. Again, these are 
rough approximations. 


e For many data sets, almost all data values will 
fall within 2 standard deviations of the mean. 


Chebyshev’s Theorem 


e The variance and standard deviation of a variable can 
be used to determine the spread, or dispersion, of a 
variable. 


e Chebyshev’s theorem The proportion of values from 
a data set that will fall within k standard deviations of 
the mean will be at least 1 — 1/k*, where k is a 
number greater than 1 (k is not necessarily an 
integer). 


This theorem states that at least three-fourths, or 75%, 
of the data values will fall within 2 standard deviations 
of the mean of the data set. This result is found by 
substituting k = 2 in the expression 


1 1 
p= Or bo 
= 

4. 
ae 
4 
= 75% 


e For the example in which variable 1 has a mean of 70 
and a standard deviation of 1.5, at least three-fourths, 
or 75%, of the data values fall between 67 and 73. 
These values are found by adding 2 standard 
deviations to the mean and subtracting 2 standard 
deviations from the mean, as shown: 


e 704+ 2(1.5) = 70+ 3 =73 and 
e 70-205) =70 —3 =67 


e For variable 2, at least three-fourths, or 75%, of the 
data values fall between 50 and 90. Again, these 
values are found by adding and subtracting, 
respectively, 2 standard deviations to and from the 
mean. 


e 70+ 2(10) = 70 + 20 = 90 and 
e 70 — 2(10) = 70 — 20 = 50 


This theorem states that at least eight-ninths, or 88.89%, 
of the data values will fall within 3 standard deviations 
of the mean of the data set. This result is found by 
substituting k = 3 in the expression 

1 


For variable 1, at least eight-ninths, or 88.89%, of the 
data values fall between 65.5 and 74.5, since 


e 70+ 3(1.5) = 704+ 4.5 = 74.5 and 
¢ 70 — 3(1.5) = 70 — 4.5 = 65.5 


For variable 2, at least eight-ninths, or 88.89%, of the 
data values fall between 40 and 100. 


e 70 + 310) = 70 + 30 = 100 and 
e 70 — 310) = 70 — 30 = 40 


In summary, then, Chebyshev’s theorem states 


e At least three-fourths, or 75%, of all data values fall 
within 2 standard deviations of the mean. 


e At least eight-ninths, or 89%, of all data values fall 
within 3 standard deviations of the mean. 


EXAMPLE 


e The mean price of houses in a certain neighborhood 
is $50,000, and the standard deviation is $10,000. 
Find the price range for which at least 75% of the 
houses will sell. 

e SV=X-2s 

e LV =X +2s 


EXAMPLE 


A survey of local companies found that the mean 

amount of travel allowance for couriers was $0.25 per 

mile. The standard deviation was $0.02. Using 

Chebyshev’s theorem, find the minimum percentage 

+ the data values that will fall between $0.20 and 
0.30. 


L.V =X + 2s where k =2 
pe = —* = 0.30-0.25=0.05/0.02=2.5 
fee ie = =0.84 (or) 84% 


The Empirical Rule 


However, when a distribution is bell-shaped (or what is 
called normal), the following statements, which make 
up the empirical rule, are true. 


e Approximately 68% of the data values will fall within 
1 standard deviation of the mean. 

e Approximately 95% of the data values will fall within 
2 standard deviations of the mean. 

e Approximately 99.7% of the data values will fall 
within 3 standard deviations of the mean. 


Smallest data value =X — 1s (68% in app) 
Largest data value = X + 1s (68% in app) 


Smallest data value =X — 2s (95% in app) 
Largest data value = X + 2s (95% in app) 


Smallest data value =X — 3s (99.7% in app) 
Largest data value = X + 3s (99.7% in app) 


Example 


suppose that the scores on a national achievement exam 
have a mean of 480 and a standard deviation of 90. If 
these scores are normally distributed. 


Approximately 68% will fall between 390 and 570 
(480 + 90 = 570 and 480 — 90 = 390). 


Approximately 95% of the scores will fall between 300 
and 660 


(480+2 = 90=660 and 480-2 = 90 =300). 


Approximately 99.7% will fall between 210 and 750 
(480 + 3 " 90 = 750 and 480 —3 = 90 = 210). 


e Because the empirical rule requires that the 
distribution be approximately bell shaped, the results 
are more accurate than those of Chebyshev’s theorem, 
which applies to all distributions. 


Ex.1 


The increases (in cents) in cigarette taxes for 17 states in 
a 6-month period are 


60, 20, 40, 40, 45, 12, 34, 51, 30, 

70, 42, 31, 69, 32, 8, 18, 50 
Find the range, variance, and standard deviation for the 
data. Use the range rule of thumb to estimate the 


standard deviation. Compare the estimate to the actual 
standard deviation. 


Ex.2 


The mean of a distribution is 20 and the standard 

deviation is 2. Use Chebyshev’s theorem. 

e a. At least what percentage of the values will fall 
between10 and 30?t 


e b. At least what percentage of the values will fall 
between12 and 28? 


Ex.3 and 4 


e Calories in Bagels The average number of calories in 
a regular-size bagel is 240. If the standard deviation is 
38 calories, find the range in which at least 75% of 
the data will lie. Use Chebyshev’s theorem. 


¢ 35. Time Spent Online Americans spend an average 
of 3 hours per day online. If the standard deviation is 
32 minutes, find the range in which at least 88.89% of 
the data will lie. Use Chebyshev’s theorem. 


Ex.5 


Using Chebyshev’s theorem, solve these problems for a 
distribution with a mean of 80 and a standard deviation 
of 10. 


e a. At least what percentage of values will fall 
between 60 and 100” 


e b. At least what percentage of values will fall 
between 65 and 95? 


Ex.6 


The average fulltime faculty member in a postsecondary 
degree-granting institution works an average of 53 
hours per week. 


e a. If we assume the standard deviation is 2.8 hours, 
what percentage of faculty members work more than 
58.6 hours a week? 


e b. If we assume a bell-shaped distribution, what 


percentage of faculty members work more than 58.6 
hours a week’? 
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Measures of Position 


In addition to measures of central tendency and 
measures of variation, there are measures of position or 
location. 


e standard scores 
e percentiles, 
e deciles 


e quartiles. 


Standard Scores 


e A standard score or z score tells how many standard 
deviations a data value is above or below the mean 
for a specific distribution of values. 


e If a standard score is zero, then the data value is the 
Same as the mean. 


Standard Scores 


N value-mean 


~ standard deviation 


e LS =~ (for population) 


= ZS = (for sample) 


Percentiles 


e Percentiles are position measures used in educational 
and health-related fields to indicate the position of an 
individual in a group. 


e Percentiles divide the data set into 100 equal groups 


Percentiles 


e The percentile corresponding to a given value X is 
computed by using the following formula: 


i no:of values below X)+0.5 
e Percentile = Ce eae 100 
total no:of values 


Finding a Data Value Corresponding 
to a Given Percentile 


where n = total number of values 


p = percentile 


Quartiles 


Quartiles divide the distribution into four equal groups, 
denoted by Q1, Q2, Q3. 


Note that Q1 is the same as the 25th percentile; Q2 is 
the same as the 50th percentile, or the median; Q3 
corresponds to the 75th percentile, as shown: 


Finding Data Values Corresponding 
to Q1, Q2, and Q3 


e Step 1 Arrange the data in order from lowest to 
highest. 

¢ Step 2 Find the median of the data values. This is the 
value for Q2. 

e Step 3 Find the median of the data values that fall 
below Q2. This is the value for Q1. 

e Step 4 Find the median of the data values that fall 
above Q2. This is the value for Q3. 


e In addition to dividing the data set into four groups, 
quartiles can be used as a rough measure of 
variability. This measure of variability which uses 
quartiles is called the interquartile range and is the 
range of the middle 50% of the data values. 


e The interquartile range (IQR) is the difference 
between the third and first quartiles. 


IQR = Q3 - QI 
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